The First Cross-Script Code-Mixed Question Answering Corpus

نویسندگان

Somnath Banerjee

Sudip Kumar Naskar

Paolo Rosso

Sivaji Bandyopadhyay

چکیده

In this paper, we formally introduce the problem of crossscript code-mixed question answering (QA) and we elaborate the corpus acquisition process and an evaluation strategy related to the said problem. Today social media platforms are flooded by millions of posts everyday on various topics. This paper emphasizes the use of such ever growing user generated content to serve as information collection source for the QA task on a low-resource language for the first time. A majority of these posts are multilingual in nature and many of them involve code mixing. The multilingual aspect of social media content is reflected in the use of multilingual words as well as in the writing script. For the ease of use multilingual users often pose questions in non-native script. Focusing on this current multilingual scenario, code-mixed cross-script (i.e., non-native script) data give rise to a new problem and present serious challenges to automatic QA. In the work presented in this paper, Bengali is considered as the native language while English is considered to be the non-native language. However, the dataset construction approach presented in this paper is generic in nature and could be used for any other language pair. Apart from introducing this novel problem, this paper highlights corpus development process and a suitable evaluation framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Code Mixed Cross Script Question Classification

With the growth in our society, one of the most affected aspect of our routine life is language. We tend to mix our conversations in more than one language, often mixing up regional language with English language is a lot more common practice. This mixing of languages is referred as code mixing, where we mix different linguistic constituents such as phrases, proper nouns, morphemes etc. to come...

متن کامل

Modeling Classifier for Code Mixed Cross Script Questions

With a boom in the internet, the social media text had been increasing day by day and the user generated content (such as tweets and blogs) in Indian languages are written using Roman script due to various socio-cultural and technological reasons. A majority of these posts are multilingual in nature and many involve code mixing where lexical items and grammatical features from two languages app...

متن کامل

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

NLP-NITMZ @ MSIR 2016 System for Code-Mixed Cross-Script Question Classification

This paper describes our approach on Code–Mixed Cross– Script Question Classification task, which is a subtask 1 of MSIR 2016. MSIR is a Mixed Script Information Retrieval event in conjunction with FIRE 2016, which is the 8th meeting of Forum for Information Retrieval Evaluation. For this task, our team NLP–NITMZ submitted three system runs such as: i) using a direct feature set; ii) using dire...

متن کامل

Amrita-CEN@MSIR-FIRE2016: Code-Mixed Question Classification using BoWs and RNN Embeddings

Question classification is a key task in many question answering applications. Nearly all previous work on question classification has used machine learning and knowledge-based methods. This working note presents an embedding based Bag-ofWords method and Recurrent Neural Network to achieve an automatic question classification in the code-mixed BengaliEnglish text. We build two systems that clas...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

The First Cross-Script Code-Mixed Question Answering Corpus

نویسندگان

چکیده

منابع مشابه

Code Mixed Cross Script Question Classification

Modeling Classifier for Code Mixed Cross Script Questions

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

NLP-NITMZ @ MSIR 2016 System for Code-Mixed Cross-Script Question Classification

Amrita-CEN@MSIR-FIRE2016: Code-Mixed Question Classification using BoWs and RNN Embeddings

عنوان ژورنال:

اشتراک گذاری